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Abstract. The objectives of this research work which is intimately related to pat- 
tern discovery and management are threefold: (i) handle the problem of pattern 
manipulation by defining operations on patterns, (ii) study the problem of enrich- 
ing and updating a pattern set (e.g., concepts, rules) when changes occur in the 
user's needs and the input data (e.g., object/attribute insertion or elimination, tax- 
onomy utilization), and (iii) approximate a "presumed" concept using a related 
pattern space so that patterns can augment data with knowledge. To conduct our 
work, we use formal concept analysis (FCA) as a framework for pattern discov- 
ery and management and we take a joint database-FCA perspective by defining 
operators similar in spirit to relational algebra operators, investigating approxi- 
mation in concept lattices and exploiting existing work related to operations on 
contexts and lattices to formalize such operators. 



1 Introduction 

The recent research topic of pattern discovery and management refers to a set 
of activities related to the extraction, description, manipulation and storage of 
patterns in a similar (but more elaborated) way as data are managed by database 
applications. In pattern management and inductive databases B4I5I16I24II . pat- 
terns are knowledge artifacts (e.g., association rules, clusters) extracted from 
data using data mining procedures (generally run in advance), and retrieved 
upon user's request. A pattern is then a concise and semantically rich repre- 
sentation of raw data. An example of a pattern could be a cluster that represents 
a set of Star Alliance members with their common features (e.g., fleet size, set 
of destinations). 

In many database and data warehouse applications, users tend to be drow- 
ning in data and even in patterns while they are actually interested in a very lim- 
ited set of knowledge pieces. Moreover, the scope of patterns to explore differs 
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from one user to another and changes over time. Finally, one is frequently in- 
terested in an exploratory and iterative process of data mining (DM) to discover 
patterns under different scenarios and different hypotheses. In order to reduce 
the memory overload of the user and his working space induced by the large 
set of mined patterns, we propose to define a set of algebraic operators similar 
in spirit to operators of relational algebra. Such operators will allow "data min- 
ing on demand" (i.e., data mining according to user's needs and perspectives) 
and rely on key operations on concept lattices such as selection, projection and 
join. Additional operations will be defined either to enrich the pattern basis or 
to identify the patterns that best approximate a "presumed" concept. 
The following example is an elementary way to display information. It presents 
the Star Alliance members in year 2000 with their destinations [El. 
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Fig. 1. Star Alliance members and their flying destinations in year 2000. 



2 Contexts, Concept Lattices and their Ideals 

Formal Concept Analysis is a branch of applied mathematics, which is based on 
a formalization of concept and concept hierarchy 0. It has been successfully 
used for conceptual clustering and rule generation. Let K = (G, M, I) be a 
formal context, where G, M and / stand for a set of objects, a set of attributes, 



and a binary relation between G and M respectively. Two functions, f\ and / 2 , 
summarize the links between subsets of objects and subsets of attributes induced 
by /. Function /i maps a set of objects into a set of common attributes, whereas 
/ 2 is the dual for attribute sets: 

/i : V{G) -> V{M), A h+ A' := {a G M | Vo G A, o/a}, 
/ 2 : 7>(M) -» 7>(G), 5 m B' := jo G G | Va G B, o/a}. 

Furthermore, the compound operators / 2 o and /i o / 2 (denoted by ") are 
closure operators on G and M respectively. In particular, Z C Z" and (Z")" = 
Z" for Z G -P(M) UP{G). The set Z is closed if Z" = Z. A formal concept c is 
a pair of sets (A, B) with A C G, 5 C M , A = B' and B = A'. A is called the 
extent of c (denoted by ext(c)) and B its intent (denoted by int(c)). In the closed 
items et mining framework M18I29II . A and B correspond to the notion of closed 
tidset and closed itemset respectively. The set of all concepts of K is denoted 
by «B(K). Ordered by (A, B) < (C, D) : <=> A C C, it forms a complete 
latticerl called the concept lattice of K and denoted by Q5(K). For (A, B) and 
(C,D) in<B(K) we have 

(A, B)y(C, D) = ((iuC)", SnD) and (A, B)A(C, D) = {AnC, (BUD)"). 

The sets G and M are related to 53(G, M, I) by the following mappings, where 
x' stands for {x}' with igGuM. 

7 :G^«(G,M,I) ^ fi:M^<B(G,M,I) 
9 l— ^ 75 := {9" yd') m l— * ^ m := (Tn',m"), 

with <J=^ 75 < //m. The 75's and the /xm's form the building blocks of 
the concept lattice. In fact, any concept is the join of some 7#'s and the meet of 
some /im's; i.e. if c is a concept of (G, M, I), then there are sets C\ C G and 
G 2 C M such that c = VI75 1 I 9 G Ci} and c = Nvl 1 ™ I 771 G C 2 }. We call 
Gi a generator of the extent of c, and G 2 a generator of the intent of c. In fact 
C'{ = ext(c) = {g G G | 75 < c} and G 2 ' = int(c) = {m G M \ fim > c}. 

Ideals and filters play an important role in describing selection and approxi- 
mation on concepts. An order ideal is a downward closed subposet. For a poset 
(P, <) and X C P, the intersection of all order ideals containing X is the 
smallest order ideal containing X. It is called the order ideal generated by X 
and denoted by [X. If X = {a} then [a := [X = {x G P \ x < a}, 
and is called principal ideal. Dually, an order filter is an upward closed sub- 
poset. For X C P, the intersection of all order filters containing X is an order 

3 This is a poset in which every subset X has an infimum (/\X) and a supremum (\/X). We set 
a A b ■— b} and a V 6 := V{ a , & }- 




Fig. 2. Concept lattice of the context of Figure [T] 

filter containing X, called the order filter generated by X. If X = {a} then 
]a := ]X = {x G P | x > a} is called principal filter. In a lattice, a lattice 
ideal or simply /cfea/ is an order ideal closed under finite suprema. For X Q P, 
the intersection of all lattice ideals containing X is again a lattice ideal con- 
taining X, called the ideal generated by X. It always contains the order ideal 
generated by X. All principal ideals are also lattice ideals. If x, y are incompa- 
rable in L, then the order ideal generated by {x, y} is [x U [y which is not a 
lattice ideal. It is smaller than the lattice ideal generated by {x, y}. The notion 
of lattice filter or simply filter is defined dually Q. 

3 Information Systems 

Frequently, data are not directly encoded in a "binary" form, but rather as a 
many-valued context in the form of a tuple (G, M, W, I) of sets such that I C 
GxMxW, with (g, m, wi) G / and (g, m, 1^2) £ J imply i«i = u>2- Gis called 
the set of objects, M the set of attributes (or attribute names) and W the set of 
attribute values. If (g, m, w) G /, then w is the value of the attribute m for the 
object g. Another notation is m(g) = w where m is a partial map from G to W. 
Many-valued contexts can be transformed into binary contexts, via conceptual 
scaling. A conceptual scale for an attribute m of (G, M, W, I) is a binary con- 



text S m := (G m , M m , I m ) such that m(G) C G m . Intuitively, M m discretizes 
or groups the attribute values into m{G), and I m describes how each attribute 
value m(g) is related to the elements in M m . For an attribute m of (G, M, W, I) 
and a conceptual scale S m we derive a binary context K m := (G, M m , I m ) 
with gl m s m : <^=^ m(g)I m s rn , where s m G M m . This means that an ob- 
ject g G G is in relation with a scaled attribute s m iff the value of m on g is 
in relation with s rn in S m . With a conceptual scale for each attribute we get 
the derived context K s := (G,N,I S ) where N := \J{M m \ m G M} and 
gl s s m <^=^ m(g)I m s m . In practice, the set of objects remains unchanged; 
each attribute name m is replaced by the scaled attributes s m G M m . An in- 
formation system is a many-valued context (G, M, W, I) with a set of scales 
(Sm)meAf- The choice of a suitable set of scales depends on the interpretation, 
and is usually done with the help of a domain expert. A Conceptual Information 
System is a one-valued (or many-valued) context together with a set of concep- 
tual scales (or hierarchies). Such a set of scales is called conceptual schema 
112 1122 11. Other scaling methods have also been proposed (see for e.g., H19I20II ). 

4 Relations and Relation Schema 

We first recall key notions on relation schema and relational algebra 1115111 . A 
relation scheme is a set of attribute names R := {Ai, . . . , A n }, denoted by 
R(A\, . . . , A n ) or simply R, where n G N is the arity of R. For each attribute 
name Ai, there is a set domAj, called the domain of A{. A relation on the scheme 
R, denoted by r(R) or r(A\, . . . , A n ) or simply r, is a seQ {t±, . . . , t p } of 
mappings from R to D := |J{domAj | 1 < i < n} such that t(Aj) G domAj 
for any tuple t G r. The ^4-value of t is t(A), and more generally, the Y- value of 
t is t(Y), where Y C {A±, . . . , A n }. A relation r can be interpreted as a table, 
where the rows are its tuples and the columns are headed by the attribute names. 
We assume that each relation r has an attribute name K, that is a key (i.e., for 
i,j G {1, . . . , p}, U ^ tj iff U(K) ^ tj(K)). Then a relation r on a scheme R 
is nothing else than a many-valued context (G, R, D, I) with 

G := {t{K) | t G r} and (t(K),m, w) G J : i(m) = iu. 

Each many- valued context can be transformed into a binary context using a set 
of scales (§> m ) m eM & Therefore, to each relation r we can associate a formal 
context K(r) and a concept lattice Q3(r), which depends on the chosen scales 

(S m )meA/- 

To handle the information stored in relations, a relational algebra has been 
defined to allow operations on relations (see for example I1I15ID . Our aim is to 

4 All relations considered here are finite. 



see how these operations can be encoded on concept lattices. Before we proceed, 
let us first recall some key operations. We start with a logic on the attributes. 
Let r be a relation on a schema R. The atomic formulae are of the form A=a 
for A G R and a G domA The connectors A, V and -i are defined as usual. 
Examples of formulae are 

<p x ■= (A=a A B=b); ip 2 := (A=a V B=b) and 933 := -i(A=a). 

Selection. We consider a relation r{A\ , . . . , A n ) and (G, M, I) a binary context 
derived from r via the scales (§>A)Ae-R- We are interested in a relation whose 
tuples are those of r with a certain value a on a specified attribute Aj. This is 
the selection operation denoted by Select(r, Aj=a) or ovi.,=a(?~) and defined by 

Select(r, Aj=a) := {t £ r \ t(Aj) = a}. 

This operation gives a special sub-context of (G, M, I) with all attributes from 
M and all those objects in G that have the scaled attribute^ja. On the concept lat- 
tice side, the above selection operation can be expressed as Select(*B(r), Aj=a) 
and corresponds to the order ideal Ifia. In case of a conjunctive condition on 
atomic formulae, ip A := /\{Aj=aj \ j G J C {1, n}}, the operation 
Select(«8(r) 

> Va) gives the ideal [ Aiej/^^i- F° r a disjunctive condition on 
atomic formulae, ipy := \J{Aj=aj j 6 J C {1, n}}, the operation 
Select(5S(r), <^ v ) gives the order ideal (Jjej il ia j- The case of negation is to 
be handled with good care. For a concept (A, B) of (G, M, I), its rcega- 
rion (resp. weak opposition) is the concept ((G \ A)", (G \ A)' (resp. ((M \ 
B)'), (M \ B)") H13I27I28L Select(<B(r), ^{A=a)) will produce all concepts 
of K(r) whose objects do not have the A-value a. This is probably not an order 
ideal of <B(r). If G \ {t(K) \ t(A) = a} is closed, then Select(<B(r), ->(A=a)) 
is an order ideal. Otherwise, the output of Select(QS(r), -<(A=a)) is only a sub- 
hierarchy of the order ideal generated by j(G \ {t(K) \ t(A) = a}). 

Projection. Let r(A x , . . . , A n ) be a relation and Y C {A\, . . . , A n }. Then 
Project(r, Y) = ily(r) = {t(Y) \ t G r} restricts the tuples of r to the at- 
tributes in Y. The projection defined above is equivalent to having a sub-context 
(G, N, J) of (G, M, I) where N is the set of scaled values of the attributes in 
Y. Two concepts c\ and C2 in <B(r) are Y-equivalent if int(ci) n7 = int(c2) (17. 

5 If &Aj is not a nominal scale, some objects that do not have exactly the -value a can also 
be chosen, provided their A, -value is in relation with the scaled attribute s a , that represents 
the group in which a belongs. 



Every concept c has a greatest concept equivalent to it. Then Project(r, Y) trans- 
lates in FCA into a projection on concepts Project(2$(r), Y) which is a copy of 
the sub-hierarchy 

Ry{r) := ({c E 5S(r) | c is the greatest element of its Y-equivalence class}, <), 
and is a A-subsemilattice of 58 (r). More precisely 

Projector), Y) = {((mt(c)ny)',mi(c)n Y) | c € i?y(r)} = 25(G, N, J). 

Figure|3]illustrates a projection on Y := {Canada, Asia pacific} and a selection. 
The four y-equivalence classes induced by this projection are identified (left) 
and their order displayed (right). The structure inside the lowest class (left) dis- 
plays the selection of Star Alliance members whose destinations include Canada 
and Asia Pacific. 




Fig. 3. Projection and selection. 



Natural Join. The most frequently used binary operation is the natural join. Let 
r(R) and s(S) be two relations. The natural join of r and s, denoted by r ttf s 
or Join(r, s), is the relation q(T) with schema T = R U (S \ R) containing all 
tuples t over T such that there are tuples t\ 6 r and £2 S s satisfying t(R) = t\ 
and t(S) = t 2 . 



Join(r, s) 



= {ti|t 2 for ti G r and t 2 G s such that ii(i?n 5) = t 2 (i?nS)}, 



where ti\t% denotes the tuple of a relation on the scheme R U (S \ R) defined 
for all attribute name A by 



ti(A) if A e R 
t 2 (A) if Ae S\R. 



Note that txfo is defined only for tuples that coincide on R n Those tuples 
are called joinable. If R n S is empty (i.e., there are no common attributes 
between the two relation schemes), then t±\t2 is defined for all tuples of r 
and of s, and Join(r, s) gives the Cartesian product r x s. When the join at- 
tribute is the identifier of objects in two contexts, then the relational join oper- 



ation is equivalent to the apposition (see Subsection 5.2) of Ki := (G, M\, I\) 
and K.2 := (G, M2, h) having the same set of objects to get a context K := 
(G, Mi U M2, I\ U I2). The lattice corresponding to the context K is isomor- 
phic to a subdirect product of the V-semilattices 53 (Ki) and 53(K2) and can be 
expressed by nested line diagrams |9). 

Other binary operations on relations include the set-based operations such as 
the intersection, the union and the difference (rPis,rL)s,r\s). These operations 
will not be discussed in this paper. A query in the form of a relational algebra 
expression |[T5ll is a well-formed expression that contains a finite number of 
relational algebra operators whose operands are relations. 



5 Algebraic Operations on Contexts 



In the following, we define a kernel of main operations on contexts and lattices. 



5.1 Sub-contexts: local views of a context 



Given a context K, one can choose (or is required) to have a local view, by 
restricting it to some objects and some attributes of K. In this subsection we 
show how the concepts of a sub-context are related to those of the initial context 
(see for example [9, Section 3.1]). Formally, for a context K := (G,M,I), 
a sub-context is a triple EI := (H,N,J) such that H C G, N C M and 
J = In (H x N). The hierarchy of concepts in EI can be seen as a sub-hierarchy 
of the concepts in K. For each concept u := (U, V) of H, its extent U is also a 
subset of G and its closure U" in K induces a concept (U",U'). The intent V is 
also a subset of M and its closure V" in K induces a concept (V , V"). We set 
tpxu := (U", U') and (f2U := (V , V"). Then <pi and ip2 define two mappings 



6 We assume that the join attributes have the same name in the two tables R and S. 



from 53(H) to 03 (K) that are order embeddings i.e. for 1 < i < 2 we have 
Vu,t> € 03(H), 



U < 

(fill < (fit 



<PiU < Vif j preserving the order of 03(H) 
u < 0, reflecting the order of 03 (K). 



To avoid confusion, we usually replace the derivation ' by the name of the re- 
lation of the context in which the derivation is done. Thus, for a subcontext 

(H, N, J) of (G, M, I), U C H and V C iV we have 



C/ 7 n iV and F J = n 77 (f) 



Next we compare the closures in H and in K. From the equalities in (f ) above, 
we have U J C U 1 and U JI 13 t/ J/ . Therefore 



U 



j j 



u JI n # d u 11 n d 17. (t) 



This means that the closure of a set U of objects in H is usually larger than the 
restriction on H of its closure in K. If U is closed in H, we have the equality, 
since (J) leads to U = U JJ D U 11 n i7 D U and U = U H D H. Similarly we 
get V 11 n JV C V JJ and y 7/ n N = V if V is closed in H. The context K in 
Figure|4]and the sub-context H given by H := {2,3,5, 6} and N := {a, b, d, /} 
illustrate the inclusion above. For U := {2, 3} we have U 1 = {a, b, d, e} and 
U 11 = {2,3,4} as well as U J = {a,b,d} and U JJ = {2,3,5}. Thus the 
closures in H and K are not always comparable, but only their restrictions on 
H. We have U 11 n H = {2, 3} C U JJ . In H, {5} is closed since 5 JJ = {5}; 
also 5 7/ = {4, 5} and 5 H n H = {5} = 5 JJ . We have seen that if U is an 
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Fig. 4. A context 
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I, its concept lattice 03 (K) and a concept lattice of its sub- 



extent and V an intent of H, then U n D H = U and V n D N = V. Thus if 
(JJ, V) is a concept of H, then (U H , U 1 ) and (V 1 , V n ) are concepts of K whose 



restrictions on H are equal to (U, V); i.e. (U H n H^U 1 n N) = (U, V) and 
(V^H^nN) = (U,V). Note that U H C F 7 and (C/ 77 , C/ 7 ) < (V 7 , F 77 ). 
Therefore, for every concept (A, B) of Kin the interval [{U 11 , U 1 ); (V 1 , V 11 )] 
we have (AC\ H,B C\ N) = (U, V). This means that the concepts of H can be 
identified with some intervals of K (see Section [6]). If every concept of K is 
in such an interval then the restriction on H of every concept (A, B) of K is a 
concept (A n H, B n N) of H. The mapping 

n u : 95(G,M, I)^<B(H,N,J) 

(A,B) ^ (A n H, B n N) 

is a surjective complete lattice homomoiphism. The mapping ile from 95 (K) to 
95(H) or simply from the initial context K to its sub-context EI coincides with a 
mixture of two relational operations: a selection of tuples t with t(K) G -?/ and 
a projection on attributes in N. Therefore, the mixture of selection and projec- 
tion operations commonly used in relational databases can be implemented for 
concept lattice manipulation by means of the theory described above. 

5.2 Enlarging contexts 

The reverse situation of local views is enlarging the context. One possibility is 
to enlarge the set of attributes. Two contexts Ki := (G, Ml, Ji) and K2 := 
(G, M2, 12), with the same set of objects can be combined to get a context K := 
(G, Mi l±J M2, I\ U 12)- The context K is called the apposition of Ki and Kg, 
and denoted by K1IK2. The extent of the concepts in the resulting lattice is 
exactly the intersection of extents of Ki and K2 [26]. In general two contexts 
(Grij Mi, Ii) and (G2, M2, 12) can be put together to get a context (G, M, /) 
with G := Gi U G 2 , M := Mi U M 2 and I := I x U I 2 if they agree on their 
intersections i.e. for any g G G*i n G 2 and m G Mi n M 2 we have 5/im <^=^ 
<?/ 2 to. Here we have 5 G Gi \ G 2 and m G M 2 \ Mi imply (g, m) £ I 
as well as 5 G G 2 \ G\ and m G Mi \ M 2 imply (fif,m) ^ /. Note that 
(Gi,Ml,Ii) C (G,M,I) 3 (G 2 ,M 2 ,/ 2 ). 

Another possibility is to enlarge the set of objects. Two contexts Ki := 
(Gi, M, Ii) and K 2 := (G 2 , M, J 2 ) (having the same set of attributes) can be 
combined to get a context K := (G\ ttJ G 2 , M, Ii U J 2 ). The context K is called 
the subposition of Ki and K 2 , and denoted by Such operation is useful to 
incrementally update the lattice 95(Ki) when a set G 2 of objects is added lT25ll . 

5.3 Generalized Patterns 

The objective here is to exploit generalization hierarchies attached to proper- 
ties to get a lattice with more abstract concepts. Producing generalized patterns 



from concept lattices when a taxonomy on attributes is provided can be done in 
different ways with distinct performance costs that depend on the peculiarities 
of the input (e.g., size, density) and the operations used. One way consists to 
use context apposition to conduct the assembly (join) of the initial lattice of non 
generalized attributes (e.g., destinations of airline companies) with the lattice 
corresponding to the taxonomy of attributes (e.g., city, country, continent), and 
then perform a projection of the resulting lattice on the generalized attributes 
(e.g., country) only. Figure [5] shows generalized patterns when the attributes 
Canada and US are replaced with the generalized attribute North America, and 
Mexico and Latin America are replaced with South America. 
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Fig. 5. Generalizing attributes 



In the following we formalize the way generalized patterns are produced. Let 
K := (G, M, /) be a context. The attributes of K can be grouped together to 
form another set of attributes, namely S, to get a context where the attributes 
are more general than in 1C. For the Star Alliance example, each member is 
flying to an airport located in a city. Cities are generalized to countries and 
sometimes to regions or continents. Formally, S can be seen as an index set 
such that {M s \ s £ S} covers M. The context (G, M, I) is then replaced with 
a context (G, £*, J) as in the scaling process. There are mainly three ways to 
express the binary relation J between the objects of G and the (generalized) 
attributes of S: 



(3) gJs : <=^> 3m G s, glm. Consider an information table describing com- 
panies and their branches in North America. We first set up a context whose 
objects are companies and whose attributes are the cities where these com- 
panies have or may have branches. If there are too many cities, we can de- 
cide to group them in provinces (in Canada) or states (in USA) to reduce the 
number of attributes. Then, the (new) set of attributes is now a set S whose 
elements are states and provinces. It is quite natural to say that a company g 
has a branch in a province/state s if g has a branch in a city m which belongs 
to the province/state s. Formally, g has attribute s iff there ism £ s such 
that g has attribute m. 

(V) gJs : -4=>- Mm G s, glm. Consider an information system about Ph.D. 
students and the components of the comprehensive doctoral exam. Assume 
that components are: the written part, the oral part, and the thesis proposal, 
and that a student succeeds in his exam if he succeeds in the three com- 
ponents of the exam. The objects of the context are Ph.D. students and the 
attributes are the different exams taken by students. If we group together the 
different components, for example 

C E. written, CE. or al,CE. proposal t— > CE.exam, 

then it becomes natural to say that a student g succeeds in his comprehen- 
sive exam CE.exam if he succeeds in all the exam parts of CE. i.e g has 
attribute CE.exam if for all m in CE.exam, g has attribute m. 
(a%) gJs : \{mes \^gjm}\ y ^ where a s is a threshold set by the user for 

the generalized attribute s. This case generalizes the (3)-case (take a = rpr) 
and the (V)-case (take a = 1). To illustrate this case, let us consider a 
context describing different specializations in a given Master degree pro- 
gram. For each program there is a set of mandatory courses and a set of 
optional ones. Moreover, there is a predefined number of courses that a stu- 
dent should succeed to get a degree in a given specialization. Assume that to 
get a Master in Computer Science with a specialization in "computational 
logic" (CL), a student must have seven courses from a set s\ of mandatory 
courses and three courses from a set S2 of optional ones. Then we can in- 
troduce two generalized attributes si and S2 so that a student g succeeds in 
the group si if he succeeds in at least seven courses from si, and succeeds 
in S2 if he succeeds in at least three courses from S2- So, for a Sl := and 

a S2 ■= wehave 

gJ* <=► '< m€a ; ' gJm}l >a„l<i<2. 

\Si\ 



Attribute generalization reduces the number of attributes. One may therefore ex- 
pect a reduction of the number of concepts (i.e., \*B(G,S, J) | < |93(Cr, M, 
Unfortunately, this is not always the case, as we can see from example in Fig. [6] 
below. Therefore, it is interesting to investigate (in the future) under which con- 
dition generalizing patterns reduces the size of the initial lattice. Moreover, find- 
ing the connections between the implications of the generalized context and the 
initial one is also an important open problem to be considered. 
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Fig. 6. A generalization increasing the size of the lattice. 

|<B(K gen )|. 



7 < 



A similar reasoning can be conducted with objects (rather than attributes) to 
replace some (or all) of them with generalized objects or clusters iPTOl . In such a 
case, the extraction can be done using an assembly of lattices having the same 
set of attributes, followed by a selection on generalized objects. 

6 Approximation of Presumed Concepts: Real World concepts vs. 
Formal Concepts 

While analyzing data, one may have a particular interest in a pair (X, Y) of 
objects/attributes. X can be for example some Star Alliance members and Y 
some destinations. The interest of the analyst is then represented by the pair 
(X, Y) , here called presumed concept, as perceived by a user. A crucial ques- 
tion is then how a presumed concept can be approximated by formal concepts of 
a context K. Given a presumed concept c := (X, Y), we set p.ext{c) := X and 
pJnt(c) := Y to mean the presumed "extent" and "intent" of c, respectively. 
We assume that there is a formal context 1C := (G, M, I) whose object set con- 
tains p_ext(c) and whose attribute set contains pJnt(c). To approximate c, we 
first look at those concepts whose extent (resp. intent) has a non empty intersec- 
tion with p_ext(c) (resp. pJnt(c)). This is done via selective (resp. projective) 
representations. 

A presumed concept c := (X, Y) has a projective conceptual representation 
tt c '■ (^4, B) i — ^ (f2(Bf~)p-int(c)), Bf)p-int(c)), and a selective conceptual rep- 
resentation £ c : (A, B) i t (A n p-ext(c), f\(A np_exi(c))), defined on formal 



concepts (A,B) of 58 (K). The inverse image of c under ir c (i.e. ir~ l (c)) con- 
tains all formal concepts whose intents contain the pJntent of c; the inverse 
image of c under £ c (i.e. ^T^c)) contains all formal concepts whose extents 
contain the p .extent of c. In fact vr ( T 1 (c) is a principal ideal of 58(K) gener- 
ated by H(c) := (pJnt(c)' ,pJnt(c)"), and ^ c T 1 (c) is a principal filter of 5B(K) 
generated by L(c) := (p_ext(c)" ,p.ext(c)'). Then L(c) is the smallest con- 
cept whose extent contains the pjextent of c and H(c) is the largest concept 
whose intent contains the pJntent of c. Therefore L(c) (respectively H(c)) is 
the extensional (resp. intensional) approximation for c. If c is a concept, then 
L(c) =c = H(c). 

A very interesting case is when c is a preconcept of 58(K) [28 ]; (i.e. X C Y' 
or equivalently Y C X'). Then X x Y C / and L(c) < H(c). In this case c 
is a rectangle full of crosses and any element of [L(c), H(c)] is a maximal rect- 
angle full of crosses that contains c. An approximation of c is then the interval 

[L(c),H(c)}. 

A presumed concept c can be erroneous. In this case, c is not a preconcept of 
K. There are some instances in the p_extent of c that do not have all attributes 
in the pJntent of c and some attributes in the pJntent of c that do not be- 
long to all the instances of the pjextent of c. We call such a presumed concept 
degenerated since there is no formal concept of K that covers c. 

Let assume that c is a preconcept. Then, the interval of all concepts contain- 
ing c can be computed based on filters and ideals. We first compute the closure 
X" of the pjextent of c. We get the concept L(c) = (X", X'), i.e., the small- 
est concept containing c. The filter ]L(c) contains all concepts whose extent is 
larger than p_extent(c). Next, we compute the closure Y" of the pJntent of c, 
and get the concept H(c) = (Y', Y"), that is the largest concept containing c. 
The ideal [H{c) contains all concepts whose intent is larger than pJntent(c). 
The approximation of c is then [L(c), H (c)] = ]L(c) n [H{c) and can be read 
easily on the lattice 58 (K). 

Figure|7]shows that the preconcept c = ({Air Canada, Lufthansa}, {Canada, 
Europe}) is not a formal concept and highlights the interval [L(c), H(c)] that 
approximate c where L(c) = ({Air Canada, Lufthansa}, {Canada, Europe,Middle 
East, Asia-Pacific, US, Latin America, Mexico}) and H{c) = ({Air Canada, 
Lufthansa, Air New Zealand, The Austrian Airlines Group, Singapore Airlines, 
United Airlines} , {Canada, Europe, Asia-Pacific, US}). 

7 Related Work 

Three main approaches towards handling the overwhelming size of mined knowl- 
edge (mainly rules) are proposed: (i) constrained-based rule mining which aims 




Fig. 7. Approximation of a preconcept in the lattice shown in Figure [2] 

to reduce the DM output by imposing constraints on the premise or the con- 
sequent of association rules H14I17L (ii) rule filtering using quality measures 
(e.g., support) and concise representations B23I12L and (iii) querying the DM 
output B4I3I2I6I11I24L In 0, the selection and discovery of actionable formal 
concepts from a pattern basis are studied, using constrained-based data mining 
and fault-tolerant pattern generation procedures. 

Recent studies on pattern management 115 1241 provide a uniform frame- 
work to data and pattern management and define links between data and pattern 
spaces through bridging operations and cross-over queries such as finding data 
covered by a given pattern or identifying patterns related to a data set. Although 
many studies limit the management of patterns to association rules only, work 
conducted by Calders etal. [5 ], and Terrovitis et al. E4l cover different types of 
patterns. In ll24l . a pattern base management system is defined for storing, pro- 
cessing and querying patterns. Moreover, languages for pattern definition and 
manipulation are proposed, and temporal aspects of patterns are handled. In Q, 
a data mining algebra and a 3-World model are defined, as well as a small set 
of data mining primitive operators are proposed to further formulate complex 
queries. The proposed model includes three worlds: D-World for data definition 
and manipulation (e.g., projection, join), I-World for region (set of constraints) 
definition and manipulation, and E-World for operations on data contained in 



regions. On the industry side, work was mainly done to design languages for 
pattern description, manipulation and exchange (e.g., PMML). 

8 Conclusion 

In this paper we have proposed a set of operators for filtering, manipulating, and 
approximating a set of concepts using formal concept analysis. To date, we have 
analyzed and implemented four algebraic operations on lattices: the selection on 
a lattice according to a conjunctive condition on attributes, the projection of a 
lattice on a set of attributes, the assembly of two lattices related to a same set of 
objects, and the approximation of a presumed concept c within a given related 
lattice. 

Our future work concerns the following issues: (i) enrich the defined alge- 
braic operators on concepts with additional ones borrowed from FCA theory, 
(ii) define new cross-over (mapping) operations between a data space (e.g., a re- 
lational table) and a pattern space expressed by a set of concepts, and (iii) study 
variants of the two kinds of mappings with their corresponding actual benefits 
in real-life applications. 
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